5 research outputs found

    PCA and K-Means decipher genome

    Full text link
    In this paper, we aim to give a tutorial for undergraduate students studying statistical methods and/or bioinformatics. The students will learn how data visualization can help in genomic sequence analysis. Students start with a fragment of genetic text of a bacterial genome and analyze its structure. By means of principal component analysis they ``discover'' that the information in the genome is encoded by non-overlapping triplets. Next, they learn how to find gene positions. This exercise on PCA and K-Means clustering enables active study of the basic bioinformatics notions. Appendix 1 contains program listings that go along with this exercise. Appendix 2 includes 2D PCA plots of triplet usage in moving frame for a series of bacterial genomes from GC-poor to GC-rich ones. Animated 3D PCA plots are attached as separate gif files. Topology (cluster structure) and geometry (mutual positions of clusters) of these plots depends clearly on GC-content.Comment: 18 pages, with program listings for MatLab, PCA analysis of genomes and additional animated 3D PCA plot

    PCA Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes

    Full text link
    Multidimensional data distributions can have complex topologies and variable local dimensions. To approximate complex data, we propose a new type of low-dimensional ``principal object'': a principal cubic complex. This complex is a generalization of linear and non-linear principal manifolds and includes them as a particular case. To construct such an object, we combine a method of topological grammars with the minimization of an elastic energy defined for its embedment into multidimensional data space. The whole complex is presented as a system of nodes and springs and as a product of one-dimensional continua (represented by graphs), and the grammars describe how these continua transform during the process of optimal complex construction. The simplest case of a topological grammar (``add a node'', ``bisect an edge'') is equivalent to the construction of ``principal trees'', an object useful in many practical applications. We demonstrate how it can be applied to the analysis of bacterial genomes and for visualization of cDNA microarray data using the ``metro map'' representation. The preprint is supplemented by animation: ``How the topological grammar constructs branching principal components (AnimatedBranchingPCA.gif)''.Comment: 19 pages, 8 figure

    Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization

    Full text link
    Principal manifolds are defined as lines or surfaces passing through ``the middle'' of data distribution. Linear principal manifolds (Principal Components Analysis) are routinely used for dimension reduction, noise filtering and data visualization. Recently, methods for constructing non-linear principal manifolds were proposed, including our elastic maps approach which is based on a physical analogy with elastic membranes. We have developed a general geometric framework for constructing ``principal objects'' of various dimensions and topologies with the simplest quadratic form of the smoothness penalty which allows very effective parallel implementations. Our approach is implemented in three programming languages (C++, Java and Delphi) with two graphical user interfaces (VidaExpert http://bioinfo.curie.fr/projects/vidaexpert and ViMiDa http://bioinfo-out.curie.fr/projects/vimida applications). In this paper we overview the method of elastic maps and present in detail one of its major applications: the visualization of microarray data in bioinformatics. We show that the method of elastic maps outperforms linear PCA in terms of data approximation, representation of between-point distance structure, preservation of local point neighborhood and representing point classes in low-dimensional spaces.Comment: 35 pages 10 figure

    Mathematical Modelling of Cell-Fate Decision in Response to Death Receptor Engagement

    Get PDF
    Cytokines such as TNF and FASL can trigger death or survival depending on cell lines and cellular conditions. The mechanistic details of how a cell chooses among these cell fates are still unclear. The understanding of these processes is important since they are altered in many diseases, including cancer and AIDS. Using a discrete modelling formalism, we present a mathematical model of cell fate decision recapitulating and integrating the most consistent facts extracted from the literature. This model provides a generic high-level view of the interplays between NFκB pro-survival pathway, RIP1-dependent necrosis, and the apoptosis pathway in response to death receptor-mediated signals. Wild type simulations demonstrate robust segregation of cellular responses to receptor engagement. Model simulations recapitulate documented phenotypes of protein knockdowns and enable the prediction of the effects of novel knockdowns. In silico experiments simulate the outcomes following ligand removal at different stages, and suggest experimental approaches to further validate and specialise the model for particular cell types. We also propose a reduced conceptual model implementing the logic of the decision process. This analysis gives specific predictions regarding cross-talks between the three pathways, as well as the transient role of RIP1 protein in necrosis, and confirms the phenotypes of novel perturbations. Our wild type and mutant simulations provide novel insights to restore apoptosis in defective cells. The model analysis expands our understanding of how cell fate decision is made. Moreover, our current model can be used to assess contradictory or controversial data from the literature. Ultimately, it constitutes a valuable reasoning tool to delineate novel experiments
    corecore